Unidimensional Clustering of Discrete Data Using Latent Tree Models

نویسندگان

  • April H. Liu
  • Leonard K. M. Poon
  • Nevin Lianwen Zhang
چکیده

This paper is concerned with model-based clustering of discrete data. Latent class models (LCMs) are usually used for the task. An LCM consists of a latent variable and a number of attributes. It makes the overly restrictive assumption that the attributes are mutually independent given the latent variable. We propose a novel method to relax the assumption. The key idea is to partition the attributes into groups such that correlations among the attributes in each group can be properly modeled by using one single latent variable. The latent variables for the attribute groups are then used to build a number of models and one of them is chosen to produce the clustering results. Extensive empirical studies have been conducted to compare the new method with LCM and several other methods (K-means, kernel Kmeans and spectral clustering) that are not model-based. The new method outperforms the alternative methods in most cases and the differences are often large.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Model-based multidimensional clustering of categorical data

Existing models for cluster analysis typically consist of a number of attributes that describe the objects to be partitioned and one single latent variable that represents the clusters to be identified. When one analyzes data using such a model, one is looking for one way to cluster data that is jointly defined by all the attributes. In other words, one performs unidimensional clustering. This ...

متن کامل

The Analysis of Bayesian Probit Regression of Binary and Polychotomous Response Data

The goal of this study is to introduce a statistical method regarding the analysis of specific latent data for regression analysis of the discrete data and to build a relation between a probit regression model (related to the discrete response) and normal linear regression model (related to the latent data of continuous response). This method provides precise inferences on binary and multinomia...

متن کامل

Parameter Estimation in Spatial Generalized Linear Mixed Models with Skew Gaussian Random Effects using Laplace Approximation

 Spatial generalized linear mixed models are used commonly for modelling non-Gaussian discrete spatial responses. We present an algorithm for parameter estimation of the models using Laplace approximation of likelihood function. In these models, the spatial correlation structure of data is carried out by random effects or latent variables. In most spatial analysis, it is assumed that rando...

متن کامل

The Comparison of Two Models for Evaluation of Pre-internship Comprehensive Test: Classical and Latent Trait

Introduction: Despite the widespread use of pre-internship comprehensive test and its importance in medical students’ assessment, there is a paucity of the studies that can provide a systematic psychometric analysis of the items of this test. Thus, the present study sought to assess March 2011 pre-internship test using classical and latent trait models and compare their results. Methods: In th...

متن کامل

Experimental Evaluation of Algorithmic Effort Estimation Models using Projects Clustering

One of the most important aspects of software project management is the estimation of cost and time required for running information system. Therefore, software managers try to carry estimation based on behavior, properties, and project restrictions. Software cost estimation refers to the process of development requirement prediction of software system. Various kinds of effort estimation patter...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015